Efficient Realization of Givens Rotation through Algorithm-Architecture Co-design for Acceleration of QR Factorization
نویسندگان
چکیده
We present efficient realization of Generalized Givens Rotation (GGR) based QR factorization that achieves 3-100x better performance in terms of Gflops/watt over state-of-the-art realizations on multicore, and General Purpose Graphics Processing Units (GPGPUs). GGR is an improvement over classical Givens Rotation (GR) operation that can annihilate multiple elements of rows and columns of an input matrix simultaneously. GGR takes 33% lesser multiplications compared to GR. For custom implementation of GGR, we identify macro operations in GGR and realize them on a Reconfigurable Data-path (RDP) tightly coupled to pipeline of a Processing Element (PE). In PE, GGR attains speed-up of 1.1x over Modified Householder Transform (MHT) presented in the literature. For parallel realization of GGR, we use REDEFINE, a scalable massively parallel Coarse-grained Reconfigurable Architecture, and show that the speed-up attained is commensurate with the hardware resources in REDEFINE. GGR also outperforms General Matrix Multiplication (gemm) by 10% in-terms of Gflops/watt which is counter-intuitive.
منابع مشابه
Efficient Realization of Householder Transform through Algorithm-Architecture Co-design for Acceleration of QR Factorization
QR factorization is a ubiquitous operation in many engineering and scientific applications. In this paper, we present efficient realization of Householder Transform (HT) based QR factorization through algorithm-architecture co-design where we achieve performance improvement of 3-90x in-terms of Gflops/watt over state-of-the-art multicore, General Purpose Graphics Processing Units (GPGPUs), Fiel...
متن کاملAsymptotic properties of the QR factorization of banded Hessenberg-Toeplitz matrices
We consider the Givens QR factorization of banded Hessenberg-Toeplitz matrices of large order and relatively small bandwidth. We investigate the asymptotic behavior of the R factor and the Givens rotation when the order of the matrix goes to infinity, and present some interesting convergence properties. These properties can lead to savings in the computation of the exact QR factorization and gi...
متن کاملFault Tolerant Givens Rotations Method and its Utilization for Matrix QR-Decomposition
A fault-tolerant algorithms based on Givens rotations and modified weighted checksum methods are proposed for matrix QR-decomposition. The purpose is to detect and correct the calculation errors occurred due to transient hardware faults during computation. The proposed algorithms enables to correct a single error among elements of each column or row of an input matrix A(M,N) (M=N for QR-algorit...
متن کاملRecycling Givens Rotations for the Efficient Approximation of Pseudospectra of Band–dominated Operators
We study spectra and pseudospectra of certain bounded linear operators on l(Z) . The operators are generally non-normal, and their matrix representation has a characteristic offdiagonal decay. Based on a result of Chandler-Wilde, Chonchaiya and Lindner for tridiagonal infinite matrices, we demonstrate an efficient algorithm for the computation of upper and lower bounds on the pseudospectrum of ...
متن کاملTowards Faster Givens Rotations Based Power System State Estimator
Nunierically stable and computationally efficient Power System Statc Estimation (PSSE) algorithms are designed using Orthogonalization (QR decomposition) approach. They u3e Givens rotations for orthogonalization which enables sparsity exploitation during factorization of large sparse auginentecl Ja,cobian. Apriori row and column ordering is usiially performed to reduce intermediate a.nd and ove...
متن کامل